gradient-based learner
Inverse Reinforcement Learning from a Gradient-based Learner
Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations. However, in many applications, we not only have access to the expert's near-optimal behaviour, but we also observe part of her learning process. In this paper, we propose a new algorithm for this setting, in which the goal is to recover the reward function being optimized by an agent, given a sequence of policies produced during learning. Our approach is based on the assumption that the observed agent is updating her policy parameters along the gradient direction. Then we extend our method to deal with the more realistic scenario where we only have access to a dataset of learning trajectories. For both settings, we provide theoretical insights into our algorithms' performance. Finally, we evaluate the approach in a simulated GridWorld environment and on the MuJoCo environments, comparing it with the state-of-the-art baseline.
Review for NeurIPS paper: Inverse Reinforcement Learning from a Gradient-based Learner
Weaknesses: I have several concerns about the proposed approach. First, the empirical results give mixed messages. In one out of three tasks (i.e., reacher), the LfL baseline significantly outperforms LOGEL (Figure 4, left). Whereas for another task (i.e., hopper), the policy trained with the reward function recovered by LOGEL outperforms the policy trained on the true reward function. And what kind of reward function does the LfL baseline recover for the hopper task, that leads to no learning at all?
Review for NeurIPS paper: Inverse Reinforcement Learning from a Gradient-based Learner
Drawing upon Inverse RL, the submission proposes learning from an expert, which is using a learning process to optimize its reward. In the initial reviews, three of four reviewers were positive on the submission, and after seeing the author feedback, one of the reviewers was persuaded to raise the overall score, so that the current scores are now (7, 7, 6, 5). With these scores, it will be likely (but not guaranteed) to be accepted to NeurIPS. Regardless, it is important to, and we trust that you will, address all of the issues that were raised by the reviewers in the next version of the manuscript.
Inverse Reinforcement Learning from a Gradient-based Learner
Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations. However, in many applications, we not only have access to the expert's near-optimal behaviour, but we also observe part of her learning process. In this paper, we propose a new algorithm for this setting, in which the goal is to recover the reward function being optimized by an agent, given a sequence of policies produced during learning. Our approach is based on the assumption that the observed agent is updating her policy parameters along the gradient direction. Then we extend our method to deal with the more realistic scenario where we only have access to a dataset of learning trajectories.